Reserved Self-training: A Semi-supervised Sentiment Classification Method for Chinese Microblogs

نویسندگان

  • Zhiguang Liu
  • Xishuang Dong
  • Yi Guan
  • Jinfeng Yang
چکیده

The imbalanced sentiment distribution of microblogs induces bad performance of binary classifiers on the minority class. To address this problem, we present a semisupervised method for sentiment classification of Chinese microblogs. This method is similar to self-training, except that, a set of labeled samples is reserved for a confidence scores computing process through which samples that are less than a predefined confidence score threshold are incorporated into training set for retraining. By doing this, the classifier is able to boost the performance on the minority class samples. Experiments on the NLP&CC2012 Chinese microblog evaluation data set demonstrated that reserved self-training outperforms the best run by 2.06% macro-averaged and 2.30% microaveraged F-measure, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Sentiment Classification using Ranked Opinion Words

This work proposes a semi-supervised sentiment classification method which is based on the co-training framework. The proposed method needs to construct three sentiment classifiers. We use common text features to construct the first classifier. We extract opinion words from consumer reviews, and then we ranked these opinion words according to their importance. We also employ extracted opinion w...

متن کامل

یک چارچوب نیمه‌نظارتی مبتنی بر لغت‌نامه وفقی خودساخت جهت تحلیل نظرات فارسی

With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...

متن کامل

Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis

Chinese microblog is a popular Internet social medium where users express their sentiments and opinions.But sentiment analysis onChinese microblogs is difficult: The lack of labeling on the sentiment polarities restricts many supervised algorithms; out-of-vocabulary words and emoticons enlarge the sentiment expressions, which are beyond traditional sentiment lexicons. In this paper, emoticons i...

متن کامل

LCCT: A Semi-supervised Model for Sentiment Classification

Analyzing public opinions towards products, services and social events is an important but challenging task. An accurate sentiment analyzer should take both lexicon-level information and corpus-level information into account. It also needs to exploit the domainspecific knowledge and utilize the common knowledge shared across domains. In addition, we want the algorithm being able to deal with mi...

متن کامل

Employing Personal/Impersonal Views in Supervised and Semi-Supervised Sentiment Classification

In this paper, we adopt two views, personal and impersonal views, and systematically employ them in both supervised and semi-supervised sentiment classification. Here, personal views consist of those sentences which directly express speaker’s feeling and preference towards a target object while impersonal views focus on statements towards a target object for evaluation. To obtain them, an unsup...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013